Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Mar 8, 2018

What changes were proposed in this pull request?

This PR fixes an incorrect comparison in SQL between timestamp and date. This is because both of them are casted to string and then are compared lexicographically. This implementation shows false regarding this query spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as date) and cast('2017-03-01' as date)").show.

This PR shows true for this query by casting date("2017-03-01") to timestamp("2017-03-01 00:00:00").

(Please fill in changes proposed in this fix)

How was this patch tested?

Added new UTs to TypeCoercionSuite.

@kiszk
Copy link
Member Author

kiszk commented Mar 8, 2018

If this approach is fine, I will create tests for .sql.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88101 has finished for PR 20774 at commit 21368b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Mar 9, 2018

cc: @gatorsmile

Copy link
Member

@gatorsmile gatorsmile Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this PR makes sense, we still need a SQLConf. Also add this behavior change into the migration guide in the SQL doc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will add a SQLConf. Is there any suggestion for a name of this entry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it means TimeStamp('2017-03-01 00:00:01') eq Date('2017-03-01') = false? Looks a bit weird.

At least this comparison does not get true in Hive:

hive> select cast('2017-03-01 00:00:00' as timestamp) = cast('2017-03-01' as date);
OK
false

Copy link
Member Author

@kiszk kiszk Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your try.

The current implementation follows this answer.

Which version of hive did you use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive 2.1.

Copy link
Member Author

@kiszk kiszk Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks
@dongjoon-hyun are you pointing out this HIVE JIRA entry in this comment? Or other HIVE JIRA entries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it's fixed in Hive 2.2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but @dongjoon-hyun 's comment said it is fixed at 2.0.

@SparkQA
Copy link

SparkQA commented Mar 18, 2018

Test build #88346 has finished for PR 20774 at commit a16deaa.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@kiszk
Copy link
Member Author

kiszk commented Mar 18, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 18, 2018

Test build #88348 has finished for PR 20774 at commit a16deaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@SparkQA
Copy link

SparkQA commented Mar 18, 2018

Test build #88357 has finished for PR 20774 at commit 5fbbc30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Mar 19, 2018

cc @gatorsmile

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified it in DB2. It works as what this PR proposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  private def findCommonTypeForBinaryComparison(
      dt1: DataType, dt2: DataType, conf: SQLConf): Option[DataType] = (dt1, dt2) match {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two more spaces before if

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  test("binary comparison with string promotion") {
    val rule = TypeCoercion.PromoteStrings(conf)
    ruleTest(rule,
      GreaterThan(Literal("123"), Literal(1)),
      GreaterThan(Cast(Literal("123"), IntegerType), Literal(1)))
    ruleTest(rule,
      LessThan(Literal(true), Literal("123")),
      LessThan(Literal(true), Cast(Literal("123"), BooleanType)))
    ruleTest(rule,
      EqualTo(Literal(Array(1, 2)), Literal("123")),
      EqualTo(Literal(Array(1, 2)), Literal("123")))
    ruleTest(rule,
      GreaterThan(Literal("1.5"), Literal(BigDecimal("0.5"))),
      GreaterThan(Cast(Literal("1.5"), DoubleType), Cast(Literal(BigDecimal("0.5")),
        DoubleType)))
    Seq(true, false).foreach { convertToTS =>
      withSQLConf("spark.sql.hive.compareDateTimestampInTimestamp" -> convertToTS.toString) {
        val date0301 = Literal(java.sql.Date.valueOf("2017-03-01"))
        val timestamp0301000000 = Literal(Timestamp.valueOf("2017-03-01 00:00:00"))
        val timestamp0301000001 = Literal(Timestamp.valueOf("2017-03-01 00:00:01"))
        if (convertToTS) {
          // `Date` should be treated as timestamp at 00:00:00 See SPARK-23549
          ruleTest(rule, EqualTo(date0301, timestamp0301000000),
            EqualTo(Cast(date0301, TimestampType), timestamp0301000000))
          ruleTest(rule, LessThan(date0301, timestamp0301000001),
            LessThan(Cast(date0301, TimestampType), timestamp0301000001))
        } else {
          ruleTest(rule, LessThan(date0301, timestamp0301000000),
            LessThan(Cast(date0301, StringType), Cast(timestamp0301000000, StringType)))
          ruleTest(rule, LessThan(date0301, timestamp0301000001),
            LessThan(Cast(date0301, StringType), Cast(timestamp0301000001, StringType)))
        }
      }
    }
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little hard to track the exact changes. Let us move all the new tests to the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 it is really confusing to look at the diff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add it to migration guide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be under hive, because it is not only for hive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.sql.typeCoercion.compareDateTimestampInTimestamp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps mention this config will be removed in spark 3.0.

(on a related note we should look at those configs for backward compatibility and consider removing them in 3.0)

@SparkQA
Copy link

SparkQA commented Mar 20, 2018

Test build #88403 has finished for PR 20774 at commit a3dc357.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 20, 2018

Test build #88412 has finished for PR 20774 at commit 4df6513.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@kiszk
Copy link
Member Author

kiszk commented Mar 20, 2018

Retest this please

@SparkQA
Copy link

SparkQA commented Mar 20, 2018

Test build #88425 has finished for PR 20774 at commit 4df6513.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@kiszk
Copy link
Member Author

kiszk commented Mar 20, 2018

Retest this please

1 similar comment
@kiszk
Copy link
Member Author

kiszk commented Mar 20, 2018

Retest this please

@SparkQA
Copy link

SparkQA commented Mar 20, 2018

Test build #88433 has finished for PR 20774 at commit 4df6513.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@SparkQA
Copy link

SparkQA commented Mar 20, 2018

Test build #88434 has finished for PR 20774 at commit 4df6513.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@SparkQA
Copy link

SparkQA commented Mar 25, 2018

Test build #88574 has finished for PR 20774 at commit 5ca2341.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule
  • case class InConversion(conf: SQLConf) extends TypeCoercionRule

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master

@asfgit asfgit closed this in e4bec7c Mar 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants